Universal quantum data compression via gentle tomography 
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Quantum state tomography-the practice of estimating a quantum state by performing measure- 
ments on it-is useful in a variety of contexts. We introduce "gentle tomography" as a version of 
tomography that preserves the measured quantum data. As an application of gentle tomography, 
we describe a polynomial-time method for universal source coding. 
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I. INTRODUCTION 



Suppose that we have a sequence of quantum states, 
each drawn from an ensemble with known density matrix 
p. Schumacher compression then allows the sequence to 
be efficiently encoded so that S(p) = — tr plog 2 p qubits 
are required to encode each state in the limit that the 
length of the sequence goes to infinity . This resembles 
classical source coding, in which a source can be com- 
pressed to a rate asymptotically approaching its Shan- 
non entropy. However, classical compression can be per- 
formed by algorithms that are universal (do not depend 
on a description of the source) and efficient (have run- 
ning time polynomial in the length of the input). In 
contrast, most existing quantum compression algorithms 
either rely on knowing the basis in which p is diagonal 
111 or have no known polynomial time implementations 



This paper presents an efficient, universal, quantum 
data compression algorithm; that is, it can compress an 
unknown i.i.d. quantum source p® n in poly(n) time to a 
rate converging to its von Neumann entropy S(p) and 
with error approaching zero as the number of copies, 
n, increases. Another efficient universal quantum data 
compression algorithm was presented in |5|, but our al- 
gorithm has the advantages of simplicity and a better 
rate-disturbance tradeoff. 

Our algorithm consists of two parts: a weak measure- 
ment of p®" that estimates p accurately without causing 
very much damage to the state, followed by compressing 
p® n based on this estimate. Conceptually, this resem- 
bles classical methods of compression which determine 
the empirical distribution of their input in their first pass 
over the data and perform the compression in the second 
pass. The only new difficulties we will encounter in the 
quantum case come from the need to perform state to- 
mography on p without causing very much damage and 
to compress p based on an imperfect estimate. 



II. GENTLE TOMOGRAPHY 

The problem of weakly measuring states of the form 
p® n was introduced in Q and further developed in 0, 0] • 
While it is impossible to measure a single state p without 
causing disturbance, we expect ordinary classical logic to 
apply to p® n when n is large, so that it is possible to 
measure even non-commuting observables precisely with 
little disturbance. For example, in Nuclear Magnetic 
Resonance, the total x-magnetization of n — 0(1O 20 ) 
nuclear spins is continuously measured without causing 
decoherence by a probe consisting of a coil of wire around 
the sample. This is possible because the measurement 
does not precisely determine the number of nuclear spins 
pointing in the x direction, but only gives a crude esti- 
mate of the quantity. In this section, we will introduce 
a procedure for state tomography on p® n and then show 
how to modify it so its disturbance vanishes for large n 
while at the same time it yields an asymptotically accu- 
rate estimate of p. 

,2 -. 

Let {cfc} fc= ~^ is an orthonormal (tra^c/j = 5jk) ba- 
sis of traceless Hermitian d x d matrices, and write the 
density matrix p as p = I/d + ^2 k (tr pak)&k- Estimat- 
ing p reduces to estimating the d 2 — 1 quantities tr pa k ■ 
If we now diagonalize o k as a k = Yli=i ^i\ v i)( v i\i then 
tr pa k — Xi(vi\p\vi), so state tomography reduces to 
estimating d(d 2 — 1) quantites of the form (4>\p\cf)) and 
then performing a classical computation. 1 

If we didn't mind damaging the state, then one method 
of estimating a :— (4>\p\(f>) would be to apply the projec- 
tive measurement {\(f>} ((f>\, I — to each copy of p. 
The number of occurences of |</>)(0| would be binomi- 
ally distributed with mean na and variance na(l — a) < 
n/A, so we could reliably estimate a to an accuracy of 
0(n -1 / 2 ). Of course, this measurement would drastically 
damage some states, such as ^{\4>) + 10^))- 
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1 Modifying our techniques to only estimate the d? — 1 quantities 
tr prjf; would cause the state estimate to converge more quickly, 
but this would make our exposition slightly more complicated. 
Unfortunately, there is no known polynomial time implementa- 
tion of quantum state tomography that has the probability of 
large deviations vanish at the asymptotically optimal rate. 



2 



Instead of measuring each state individually, we can 
also express this measurement as a collective operation 
on all n states simultaneously. It is given by the operators 



E 



\x i \^){cj>\ + {l-x i ){t-\^){cj>\). (1) 



cG{0,l}" i=l 
1*1=* 



where k ranges from to n and \x\ denotes the number of 
l's in the n-bit string x. Clearly, measuring {Mk} yields 
the same statistics as measuring each state individually 
and counting the |</>)(</>| outcomes. The measurement can 
also be constructed efficiently: we unitarily count the 
number of occurences of \<p) in the n states in an ancilla 
register and then measure the ancilla (see Fig.^). 
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FIG. 1: Circuit for performing the measurement in Eq. Q. 
The controlled-+l operations map |0)|a;) to \<j>)\x + 1) for any 
value x of the target and leave other states unchanged. 

Unfortunately, even the collective measurement in 
Eq. causes substantial damage to the state. For ex- 
ample, if the measurement {Mk} is repeated, then the 
distribution of k will have a variance of 0(n) the first 
time and on subsequent measurements. 2 

In p| this problem was solved by initalizing the ancilla 
in Fig. (T) to the state £ fc e - fc2 / 2A2 \k) instead of |0>. The 
measurement of k then has variance A 2 +0(n) and it can 
be shown0 that the damage to p® n is 0(n/(A 2 + n)). 
Ref. proposed a method which causes more damage 
to the state, but is easier to analyze for our purposes. 

To implement the gentle measurement of Q, we will 
divide up the range from . . . n into m bins, with bound- 
aries = bo < b\ ■ ■ ■ < b m = n + l. Then we will modify 
the collective measurement of Eq. (Q to measure only 
the bin that the state lies in instead of determining the 
exact value of k. The new measurement {Mj} is given in 
terms of the M k of Eq. (JTJ by 



E M * 

bj — i<k<bj 



(2) 



2 This damage is sometimes useful. In |£j, it is used as the first step 
of entanglement concentration. In fact, our compression protocol 
may be thought of as the gentle analogue of ||| in the same way 
that the compression scheme in is the gentle analogue of the 
entanglement concentration procedure of \ 



where j ranges from 1 to m. 

If the bin size, n/m, is much larger than the 0(\fn) 
width of p® n then we expect to project onto a measure- 
ment outcome that contains almost all of the support of 
p® n , thereby causing little disturbance. Since we want to 
avoid having a bin boundary within 0{\/n) of the state, 
for any choice of p, we will choose the bi uniformly at 
random from between and n. 

The choice of m now defines a trade-off between dis- 
turbance caused to p®" and information gained about p. 
Choosing a smaller m means that each bin is larger, so 
that a measurement outcome lets us infer less about p, 
but we have a smaller probability of damaging p® n by 
projecting onto only part of its support. 

Proposition 1 The measurement {M'^} described above 
can be implemented in 0(n) gates. If we choose m = n s 
for < s < 1/2, then the measurement will fail with 
probability 0(n s_1 ' 2 Inn). Upon success, the measure- 
ment outcome is within 0{n l ~ s Inn) of not and the dis- 
turbance (in the sense of entanglement fidelity) is less 
than exp(-C(ln 2 n)) < 0{n~ p ) for any constant p. 

Proof of proposition IT1 

We begin by describing how to implement {Afj}. First 
we count the number of times \<j>) occurs in p® n and store 
the result k € {0, . . . , n} in an ancilla register. Then we 
perform a classical computation to determine which bin j 
contains the result k. We measure j, thus implementing 
the projective measurement Mj and then uncompute j 
and finally uncompute k. This is demonstrated in Fig EI 
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FIG. 2: Circuit for performing the gentle measurement in 
Eq. @. The controlled— 1-1 and controlled — 1 operations act 
on the target only when the control is in the state \tf>). The 
gate BIN classically computes which bin contains the top reg- 
ister and stores it in the bottom register. 

We define three possible causes of failure: i) some 
bi will be too close to na (within n^lnn), ii) there 
won't be any bi on either side of na within n Inn 
and iii) measuring Mj will yield a bin that does not con- 
tain ncv. Using the union bound, we can show that i) 
has probability < m(2n 1 / 2 In n)/n = 2n s_1 / 2 lnn. Next, 
the probability that no bi is in [na — : 



s In n,na] is 

< (1 — n~ s mn) ro < e~ lnn = n _1 , and likewise for 
the interval [na,na + n 1_s lnn], so the probability of 
ii) is < 2/n. Finally, if we are given that no bin is 
within n ly/2 lnn of na (i.e. i) hasn't occured), then us- 
ing a Chernoff bound we can show that the probability 



3 



of iii) is less than exp(— 0(ln 2 n)). Thus, the possibility 
of failure is dominated by the probability of i), which is 
Gin*- 1 ' 2 Inn). 

We say that the gentle measurement is successful if 
none of i), ii) or iii) occur. In this case, we can take 
as our estimate for a an arbitrary value within the bin 
we have measured and by ii) will err by no more than 
2n~ s lnn. Finally, let Mj be the measurement outcome 
we obtain, let \f} ^ B be a purification of p®" and define 
7T := AI'j&Ib- Then the post-measurement state is \<p') = 

-, and the entanglement fidelity is F e — (tp\<p') — 

i==== = J {(p\%\(pj. From iii) we have (<p\Tr\tp) > 

1 — e where e = exp(— cO(ln 2 n)), so F e > ^Jl — e = 
l-exp(-cO(ln 2 n)). 3 H 

To perform gentle tomography we simply divide the 
n states into d(d 2 — 1) blocks of length I = L g7g£zij ] 

and gently measure each block. If {\v\ )}f—i is the basis 
for <Tfc , then we can index the blocks by i = 1, . . . d and 
k = 1, . . . , d 2 — 1 and measure \v\ ) on block (i, k). 



Proposition 2 (Gentle tomography) For any < 

s < 1/2 and fixed Hilbert space dimension, applying the 
procedure described above to p® n requires poly(n) time 
and fails with probability 0(n s_1 / 2 Inn). Upon success, 
the disturbance is less than 0(nT 2 ) and the estimate p 
satisfies \\p — p\\i < 0(n~ s Inn). 



Proof: We say that tomography succeeds when each 
of the d{d 2 — 1) measurements succeed individually. Since 
the dimension d is a constant, we can use Proposition^to 



inequality, 



bound the failure probability by 0(d 
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0(n s x l 2 Inn) and the state disturbance by 0(d 9 n 2 ) 
0(n~ 2 ). 

We still need to describe how to form an accurate esti- 
mate p. Assume that each gentle measurement has suc- 
ceeded. Then the d(d 2 — 1) gentle measurements output 
not state estimates, but bins, (&i, 62, |<^)), guaranteeing 
only that 61 < < b 2 - We will try to find a state p 

that is consistent with each bin. Since p is consistent with 
each bin, we know that some such p exists. We can find it 
efficiently by solving a semi-definite program for p given 
by the constraints: p > 0, trp = 1 and b\ < (4>\p\<p) < 62 
for each bin (61,62, |</>))- 4 

Given such a p, we have for each gentle measurement 
that \{<j>\{p - p)\(j>)\ < e, where e = ©(n^Mnn). Then 
if o- k = T,iM v i)( v i\' l tr (p-p)CTfc| = \ J2iM v i\(P ~ 
P)\ v i}\ < e ^2i ^ Vde- Thus, by the Cauchy-Schwartz 



3 A similar result was proved in Lemma 9 of lid 

4 If one of the gentle measurements fails, this semidefinite program 
may fail or it may report a totally erroneous answer. 
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Ii <d\\p- p\\ 2 = d /£(tr(p - p)a k ) 2 < a*' 2 e 



This extends our trade-off curve for gentle measure- 
ments to full gentle state tomography. It is an interest- 
ing question whether the tradeoff we have found between 
accuracy and probability of failure is optimal up to log- 
arithmic factors. 



III. UNIVERSAL COMPRESSION 

Now look more closely at the quantum coding. Schu- 
macher compression works by identifying the eigenvalues 
and eigenvectors of p, then coherently performing classi- 
cal Shannon compression on sequences of those eigenvec- 
tors with probabilities given by the corresponding eigen- 
values. However, we are forced to operate with only an 
estimate p w p, so we will need to use a data compres- 
sion scheme that deals well with small inaccuracies in the 
state estimate. 

This case has been analyzed in which found 
that compressing p in the basis {\i)} with any clas- 
sical algorithm gives an asymptotic rate of R = 
Si 1°§ This is because compressing p faith- 

fully reduces to compressing the diagonal entries of p in 
an arbitrary basis {\i)}- Due to the nonnegativity of the 
relative entropy (S(p\\a) — trp(\ogp — logcr) > 0), we 
have R < — trploga = S(p)+S(p\\o-) for any density ma- 
trix a that can be diagonalized as er = ^2 i p%\i){i\- Thus, 
for any density matrix er, we can encode p by diagonaliz- 
ing it in the basis of a and then using a classical reversible 
algorithm. This will achieve a rate R < S(p) + S(p\\ a). 

Unfortunately, there is no simple bound for S(p\\p) in 
terms of \\p — p\\i; in fact, the relative entropy can be 
infinite if the support of p is not contained within the 
support of p. This problem corresponds to the situation 
when our state estimate has led the encoder to believe 
that certain vectors will never appear, so that when it 
encounters them in p, it has made no provision to deal 
with them. The solution to this is simple: assume that 
any input vector has a small, but non-zero, chance of oc- 
curing. This means that instead of encoding according to 
p, we will use p$ :— (1 — 5)p + 5I/d as our state estimate, 
for some small 5 > 0. 

Suppose that after performing gentle tomography \\p — 
p\\i < e. Then if we choose e, S = (D(n~ s logn), we can 
bound the rate by 

R < -trplogps < S{ps) +0{n- s \og 2 n) 



< 



S{p) + d(n- s log 2 n) 



The second inequality follows from the operator in- 
equality ps > SI jd (implying — log ps > \og(d/6)I — 
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O (log n)I) and the last inequality is due to Fannes' in- 
equality. We have neglected the inefficiency of the classi- 
cal coding, since we can choose it to be 0(n~ s ) and will 
incur only exponentially small damage for s < i. 

To analyze the errors, note that since we usually can- 
not tell when tomography has failed, we ought to con- 
sider failure to be another form of disturbance. Thus, 
the 0(n s- 2) probability of failure dominates the state 
disturbance and the errors from classical coding. This 
is consistent with the observation in that universal 
compression schemes have yet to achieve better than a 
polynomially vanishing error. 

Since our compression algorithm outputs a variable 
number of qubits, damage to the encoded state is not 
the only possible form of error. Upon failure, our al- 
gorithm risks producing a string length well above the 
n(S(p) + n~ s log n) qubits we expect; in fact, the only 
absolute bound we can establish is nlogd qubits. For- 
tunately, the probability that p® n is compressed to nR 
qubits for R > S(p) decreases as C(exp(— nK )) for some 
constant K depending only on p and R. Following 0, 
we define this overflow exponent as 

K = lim — log [prob. that p® n yields > nR qubits] 

n — >oo fi 

The codes described in |7| achieve the optimal value of K: 
vni a . H ^ > f i S(a\\p). In contrast, our algorithm 5 achieves 

d 2 -i 

K= i n L D ^2 T\ S(M k (a)\\M k (p)) (4) 

a:H(a)>R d(d z — 1) J 

where M k denotes the operation of measur- 
ing in the eigenbasis of a k (i.e. Mfe(p) = 

E*l»i W ><«, (fc) |pK w ><«,«|). 

To review, our encoding procedure is: 

1. Perform gentle tomography on p® n using n s bins, 
yielding an estimate p. 

2. Construct a modified estimate ps = (1 — 8)p + 5I/d 
for S = 0(n- s ). 

5 It is possible to gently measure tr pcrj. directly, instead of inferring 
it from d gentle measurements of o^'s eigenvectors. Using this 
for gentle tomography results in a compression scheme with an 



3. Encode p® n with an efficient classical algorithm 
(such as arithmetic codingjlj) using the basis of 
ps as the computational basis. 

4. Attach a classical description of ps with 0(y/n) bits 
of precision and a [log (n log g?)] bit register indicat- 
ing the length of the compressed data. 

The decoding procedure is simply to extract the descrip- 
tion of ps and use it as the basis for a classical decoding 
algorithm. 

IV. CONCLUSION 

We have described a polynomial time algorithm for 
compressing p® n into nS(p) + (D(n~ s log 2 n) qubits with 
error rate (D(n s ~ 2 logn). This matches the error rate 
and inefficiency of the proof of , though not their over- 
flow exponent. The procedure of 0, on the other hand, 
can only achieve a compression rate of S(p) + 0(n~ s ) by 
incurring an error rate of 0(n~ 2+ s ( 1 + d )) (possibly up 
to logarithmic factors) and an overflow exponent of zero. 
For example, compressing qubits with constant error is 
only possible at a rate of S(p) + C(n -1 / 10 ). 

More elegant would be a method for ergodic sources 
analogous to Lempel-Ziv- Walsh coding that adaptively 
created a quantum dictionary and compressed quantum 
information on the fly. But the method proposed here 
still allows the coding of sources with unknown statis- 
tics to attain the quantum transmission limit for sources 
with known statistics as the message length approaches 
infinity. 



V. ACKNOWLEDGEMENTS 

This work was partially supported by the Hewlett 
Packard-MIT foundation (HP-MIT), by the ARO un- 
der a MURI program, by ARDA via NRO, and by the 
NSA and ARDA under contract number DAAD19-01-1- 
06. We are grateful to I. Chuang, K. Matsumoto, R. 
Schack and B. Schumacher for helpful discussions. 

overflow exponent d times higher, though still not optimal. 
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